NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Token Turing Machines are Efficient Vision Models

https://doi.org/10.48550/arXiv.2409.07613

Jajal, Purvish; Eliopoulos, Nick John; Chou, Benjamin Shiue-Hal; Thiruvathukal, George K; Davis, James C; Lu, Yung-Hsiang (February 2025, The Computer Vision Foundation.)

We propose Vision Token Turing Machines (ViTTM), an efficient, low-latency, memory-augmented Vision Transformer (ViT). Our approach builds on Neural Turing Machines and Token Turing Machines, which were applied to NLP and sequential visual understanding tasks. ViTTMs are designed for non-sequential computer vision tasks such as image classification and segmentation. Our model creates two sets of tokens: process tokens and memory tokens; process tokens pass through encoder blocks and read-write from memory tokens at each encoder block in the network, allowing them to store and retrieve information from memory. By ensuring that there are fewer process tokens than memory tokens, we are able to reduce the inference time of the network while maintaining its accuracy. On ImageNet-1K, the state-of-the-art ViT-B has median latency of 529.5ms and 81.0% accuracy, while our ViTTM-B is 56% faster (234.1ms), with 2.4 times fewer FLOPs, with an accuracy of 82.9%. On ADE20K semantic segmentation, ViT-B achieves 45.65mIoU at 13.8 frame-per-second (FPS) whereas our ViTTM-B model acheives a 45.17 mIoU with 26.8 FPS (+94%).
more » « less
Free, publicly-accessible full text available February 28, 2026
Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge

https://doi.org/10.1109/WACV61041.2025.00695

Eliopoulos, Nicholas John; Jajal, Purvish; Davis, James C; Liu, Gaowen; Thiravathukal, George K; Lu, Yung-Hsiang (February 2025, IEEE)

Free, publicly-accessible full text available February 26, 2026
Can Large-Language Models Help us Better Understand and Teach the Development of Energy-Efficient Software?

Hasler, Ryan; Läufer, Konstantin; Thiruvathukal, George K; Peng, Huiyun; Robinson, Kyle; Davis, Kirsten; Lu, Yung‑Hsiang; Davis, James C (October 2024, arXiv)

Computing systems are consuming an increasing and unsustainable fraction of society’s energy footprint, notably in data centers. Meanwhile, energy-efficient software engineering techniques are often absent from undergraduate curricula. We propose to develop a learning module for energy-efficient software, suitable for incorporation into an undergraduate software engineering class. There is one major problem with such an endeavor: undergraduate curricula have limited space for mastering energy-related systems programming aspects. To address this problem, we propose to leverage the domain expertise afforded by large language models (LLMs). In our preliminary studies, we observe that LLMs can generate energy-efficient variations of basic linear algebra codes tailored to both ARM64 and AMD64 architectures, as well as unit tests and energy measurement harnesses. On toy examples suitable for classroom use, this approach reduces energy expenditure by 30–90%. These initial experiences give rise to our vision of LLM-based metacompilers as a tool for students to transform high-level algorithms into efficient, hardware-specific implementations. Complementing this tooling, we will incorporate systems thinking concepts into the learning module so that students can reason both locally and globally about the effects of energy optimizations.
more » « less
Full Text Available
Token Turing Machines are Efficient Vision Models

https://doi.org/10.1109/WACV61041.2025.00767

Jajal, Purvish; Eliopoulos, Nick John; Chou, Benjamin Shiue-Hal; Thiravathukal, George K; Davis, James C; Lu, Yung-Hsiang (February 2025, IEEE)

Free, publicly-accessible full text available February 26, 2026
Large Language Models for Energy-Efficient Code: Emerging Results and Future Directions

Peng, Huiyun; Gupte, Arjun; Eliopoulos, Nicholas J; Ho, Chien‑Chou; Mantri, Rishi; Deng, Leo; Jiang, Wenxin; Lu, Yung‑Hsiang; Läufer, Konstantin; Thiruvathukal, George K; et al (October 2024, arXiv)

Energy-efficient software helps improve mobile de- vice experiences and reduce the carbon footprint of data centers. However, energy goals are often de-prioritized in order to meet other requirements. We take inspiration from recent work exploring the use of large language models (LLMs) for different software engineering activities. We propose a novel application of LLMs: as code optimizers for energy efficiency. We describe and evaluate a prototype, finding that over 6 small programs our system can improve energy efficiency in 3 of them, up to 2x better than compiler optimizations alone. From our experience, we identify some of the challenges of energy-efficient LLM code optimization and propose a research agenda.
more » « less
Full Text Available
Visualize Music Using Generative Arts

https://doi.org/10.1109/CAI59869.2024.00273

Ng, Brian Man-Kit; Rose_Sudhoff, Samantha; Li, Haichang; Kamphuis, Joshua; Nadolsky, Tim; Chen, Yingjie; Yun, Kristen Yeon-Ji; Lu, Yung-Hsiang (June 2024, IEEE)

Music is one of the most universal forms of communication and entertainment across cultures. This can largely be credited to the sense of synesthesia, or the combining of senses. Based on this concept of synesthesia, we want to explore whether generative AI can create visual representations for music. The aim is to inspire the user’s imagination and enhance the user experience when enjoying music. Our approach has the following steps: (a) Music is analyzed and classified into multiple dimensions (including instruments, emotion, tempo, pitch range, harmony, and dynamics) to produce textual descriptions. (b) The texts form inputs of machine models that can predict the genre of the input audio. (c) The prompts are inputs of generative machine models to create visual representations. The visual representations are continuously updated as the music plays, ensuring that the visual effects aptly mirror the musical changes. A comprehensive user study with 88 users confirms that our approach is able to generate visual art reflecting the music pieces. From a list of images covering both abstract images and realistic images, users considered that our system-generated images can better represent pieces of music than human-chosen images. It suggests that generative arts can become a promising method to enhance users' listening experience while enjoying music. Our method provides a new approach to visualize music and to enjoy music through generative arts.
more » « less
Full Text Available
Securing Deep Neural Networks on Edge from Membership Inference Attacks Using Trusted Execution Environments

Yang, Cheng-Yun; Ramshankar, Gowri; Nambiar, Sudarshan; Miller, Evan; Zhang, Xun; Eliopoulos, Nicholas; Jajal, Purvish; Jing_Tian, Dave; Chen, Shuo-Han; Perng, Chiy-Ferng; et al (August 2024, 2024 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED))

Full Text Available
An automated approach for improving the inference latency and energy efficiency of pretrained CNNs by removing irrelevant pixels with focused convolutions

https://doi.org/10.1109/ASP-DAC58780.2024.10473884

Tung, Caleb; Eliopoulos, Nicholas; Jajal, Purvish; Ramshankar, Gowri; Yang, Cheng-Yun; Synovic, Nicholas; Zhang, Xuecen; Chaudhary, Vipin; Thiruvathukal, George K; Lu, Yung-Hsiang (January 2024, Asia and South Pacific Design Automation Conference (ASP-DAC))

Computer vision often uses highly accurate Convolutional Neural Networks (CNNs), but these deep learning models are associated with ever-increasing energy and computation requirements. Producing more energy-efficient CNNs often requires model training which can be cost-prohibitive. We propose a novel, automated method to make a pretrained CNN more energyefficient without re-training. Given a pretrained CNN, we insert a threshold layer that filters activations from the preceding layers to identify regions of the image that are irrelevant, i.e. can be ignored by the following layers while maintaining accuracy. Our modified focused convolution operation saves inference latency (by up to 25%) and energy costs (by up to 22%) on various popular pretrained CNNs, with little to no loss in accuracy
more » « less
Full Text Available
Evolution of Winning Solutions in the 2021 Low-Power Computer Vision Challenge

https://doi.org/10.1109/MC.2023.3250246

Hu, Xiao; Jiao, Ziteng; Kocher, Ayden; Wu, Zhenyu; Liu, Junjie; Davis, James C; Thiruvathukal, George K; Lu, Yung-Hsiang (August 2023, Computer)

Full Text Available
Tree-Based Unidirectional Neural Networks for Low-Power Computer Vision

https://doi.org/10.1109/MDAT.2022.3217016

Goel, Abhinav; Tung, Caleb; Eliopoulos, Nick; Thiruvathukal, George K.; Wang, Amy; Lu, Yung-Hsiang; Davis, James C. (June 2023, IEEE Design & Test)

Full Text Available

« Prev Next »

Search for: All records